Probabilistic Frequent Pattern Growth for Itemset Mining in Uncertain Databases (Technical Report)

نویسندگان

  • Thomas Bernecker
  • Hans-Peter Kriegel
  • Matthias Renz
  • Florian Verhein
  • Andreas Züfle
چکیده

Frequent itemset mining in uncertain transaction databases semantically and computationally differs from traditional techniques applied on standard (certain) transaction databases. Uncertain transaction databases consist of sets of existentially uncertain items. The uncertainty of items in transactions makes traditional techniques inapplicable. In this paper, we tackle the problem of finding probabilistic frequent itemsets based on possible world semantics. In this context, an itemset X is called frequent if the probability thatX occurs in at leastminSup transactions is above a given threshold τ . We make the following contributions: We propose the first probabilistic FP-Growth algorithm (ProFP-Growth) and associated probabilistic FP-Tree (ProFP-Tree), which we use to mine all probabilistic frequent itemsets in uncertain transaction databases without candidate generation. In addition, we propose an efficient technique to compute the support probability distribution of an itemset in linear time using the concept of generating functions. An extensive experimental section evaluates the our proposed techniques and shows that our ProFP-Growth approach is significantly faster than the current state-ofthe-art algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Frequent Pattern Growth for Itemset Mining in Uncertain Databases

Frequent itemset mining in uncertain transaction databases semantically and computationally di ers from traditional techniques applied on standard (certain) transaction databases. Uncertain transaction databases consist of sets of existentially uncertain items. The uncertainty of items in transactions makes traditional techniques inapplicable. In this paper, we tackle the problem of nding proba...

متن کامل

Mining Frequent Itemsets over Uncertain Databases

In recent years, due to the wide applications of uncertain data, mining frequent itemsets over uncertain databases has attracted much attention. In uncertain databases, the support of an itemset is a random variable instead of a fixed occurrence counting of this itemset. Thus, unlike the corresponding problem in deterministic databases where the frequent itemset has a unique definition, the fre...

متن کامل

Analysis of Frequent Item set Mining on Variant Datasets

Association rule mining is the process of discovering relationships among the data items in large database. It is one of the most important problems in the field of data mining. Finding frequent itemsets is one of the most computationally expensive tasks in association rule mining. The classical frequent itemset mining approaches mine the frequent itemsets from the database where presence of an...

متن کامل

A survey of itemset mining

Itemset mining is an important subfield of data mining, which consists of discovering interesting and useful patterns in transaction databases. The traditional task of frequent itemset mining is to discover groups of items (itemsets) that appear frequently together in transactions made by customers. Although itemset mining was designed for market basket analysis, it can be viewed more generally...

متن کامل

Probabilistic Frequent Itemset Mining on a GPU Cluster

Probabilistic frequent itemset mining, which discovers frequent itemsets from uncertain data, has attracted much attention due to inherent uncertainty in the real world. Many algorithms have been proposed to tackle this problem, but their performance is not satisfactory because handling uncertainty incurs high processing cost. To accelerate such computation, we utilize GPUs (Graphics Processing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1008.2300  شماره 

صفحات  -

تاریخ انتشار 2010